This is the output for the comparison sc.250425.K562.default.Xu.cv. Following analyses evaluate how well the experimental data agrees with the predictions of CRE - gene pairs. Following input files were used:
Experimental data:
EPCrisprBenchmark_ensemble_data_GRCh38.intGENCODEv43.1Mb.tsv.gz
Predictions: encode_e2g_predictions.tsv.gz,
encode_e2g_predictions.tsv.gz, encode_e2g_predictions.tsv.gz,
encode_e2g_predictions.tsv.gz, encode_e2g_predictions.tsv.gz,
pairs.STARE.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz,
pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz,
pairs.E2G.res.tsv.gz, pairs.E2G.res.tsv.gz,
pairs.E2G.res.150000.250317.tsv.gz
Following parameters in the config file config/pred_config.250425.K562.default.txt were used to overlap predictions with experimental data and to assess performance of predictors. If no config file was provided, this was generated using default values. It’s strongly recommended to use a prediction config file to control how predictors should be treated.
The number of CRISPR enhancer-gene pairs that overlapped enhancer-gene pairs for each predictor are counted. CRISPR enhancer-gene pairs that did not overlap any predicted pairs, are considered not predicted. Large fractions of CRISPR E-G pairs not overlapping predictions lead to poor performance.
Number of CRISPR enhancer-gene pairs overlapping enhancer-gene pairs in predictions.
Precision-recall (PR) curves are used for comparing the performance of different predictors on the experimental data. The area under the PR curve (AUPRC) provides a single metric of a predictors performance.
Precision-recall curves for all predictors in all matching experimental cell types. Dots represent alpha cutoff values as specified in pred_config file. If no alpha was set, the minium alpha in predictions was taken by default, respectively the maximum for inverse predictors. Distance to TSS was added as baseline predictor and computed from the provided ‘gene universe’.
Precision-recall performance summary for predictors. Table shows Area-under-the-PRC (AUPRC) and precision at specified thresholds (if specified) and minimum sensitity (recall) of 0.7.
Receiver Operating Characteristic (ROC) curves are an alternative method to compare performance by computing true positive rates and false positive rates for each predictor.
ROC curves for all predictors in all matching experimental cell types. Distance to TSS and nearest genes/TSS were added as baseline predictors and computed from the provided ‘gene universe’.
Each predictor listed in the prediction data is plotted against the effect size of enhancer perturbations reported in the experimental data (e.g. percent change in expression). These plots show how well a predictor is associated with effects observed in CRISPRi enhancer screens in an intuitive way.
Predictors versus CRISPRi effect size. Effect size is defined as percent change in target gene expression upon CRISPRi perturbation of an enhancer. Effect size values are taken from ‘EffectSize’ column in experimental data, while predictor scores correspond to scores from prediction files. Numbers show Spearman’s rank correlation coefficient (rho) between effect size and predictor scores.
The scores of each predictor is compared between experimental positives and negatives to get another assessment of how well it distinguishes true enhancer - gene pairs from negatives.
Predictor scores vs. experimental outcome for all predictors. Each point represents one E-G pair in the experimental data. Cases where the predictor value is 0 or infinite might correspond to E-G pairs that were not found in predictions and predicor values were filled in according to the prediction config file
Enhancer-gene pairs are binned based on their distance to TSS and predictor performance is assessed for each bin.
Area under the Precision-Recall Curve (AUPRC) for different distance to TSS bins (kb).
Precision-Recall curves for different distance to TSS bins (kb).
Predictor scores versus experimental outcome for different distance to TSS bins (kb).
CRISPR effect size vs predictor scores for different distance to TSS bins (kb).
If any gene or enhancer features are provided versions faceted by these features of the PR curves, predictor vs experiment and effect size plots are created.
How well predictor scores correlate with each other for E-G pairs in the experimental data is investigated.
Correlation of scores between predictors for experimenta E-G pairs.
Different features of the experimental data are investigated.
Distance to TSS distributions for all E-G pairs in experimental data. E-G pairs are partitioned according to whether they were identified as enhancer-gene interactions (positives) or negatives.
A plot showing the number of experimentally tested candidate enhancers overlapping provided genomic features. If no features were provided, this plot is not generated.